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Abstract 

Data structures that realize a dictionary are characterized by three 
basic instructions : (1) Insert a new entry (key and value). (2) Search by 
a key, returning the associated value. (3) Delete an entry. 

Known realizations are hashing schemes and various types of search 
trees. Time complexity of the fundamental operations is measured as a 
function of the number of entries n, the (binary) size of a key s is usually 
not considered. For search trees the expected time as well as the upper 
limit of time is O(logn). LiFo dictionary is a new implementation, the 
time limits for Insert and for Search both are a linear function of the 
length s of the used key, that is O(s). The LiFo dictionary furthermore 
provides two additional basic operations: (4) open environment and (5) 
close environment with a constant time for performing. Close environ- 
ment needs only one assignment by which it restores exactly the same 
internal situation as before the last call of open environment. This fea- 
ture cannot be realized by any of the data structures mentioned above, 
therefore the prefix LiFo (=last in first out), another name for stack. This 
ability is highly suitable for software applications that frequently perform 
local symbol to value binding throughout multiple levels of environments. 
E.g. Lisp-Interpreter or D. Knuth's TgXprogramm are such applications. 
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1 Introduction 



To illustrate the principle of this data structure in a simplified fashion, a game 
model, which is demonstrated by an example, seems to be suitable. Assume that 
the president of some association keeps a dictionary of the members. Keys of the 
dictionary are the member's names, information to be accessed as value assigned 
to a key is the year of birth. Assume the current state of the dictionary are 
three entries: ALFRED (1940), ALBERT (1955), PETRA (1960). This current 
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state in the game is modelled by a route pattern of square shaped cards, each 
with one letter written on it, as the following figure shows. 



Sorry, Figure 1 cannot be presented! 

(the author will send a sheet of 3 Figures on request). 



We label card A by > as starting square and mark the ends with the value 
encircled (the birth year). Using another set of these cards beside we place a so 
called argument line, whose letters form the word to be searched (the input). 
The search is simulated by the following procedure: First we put a piece a onto 
the starting square of the above route pattern and a piece b onto the first square 
of the argument line. This is the starting configuration. A configuration of the 
game in process is characterized by the square locations of the two pieces, we 
shall refer to it as the current squares in the dictionary and in the argument 
line. A successful final configuration is reached, if a covers the last square of 
the line, b is upon a square with circled value attached and the letters of both 
squares agree. The answer of the search then is the circled value next to b. The 
change from one configuration to the next is described as follows: 



If the letters of the current fields are the same, the pieces move one 
square in straight direction. 

In the other case, if on the left hand of piece b a square is adjacent, 
then b moves left to this square and a keeps its position; if there 
is no square lefthand, however, the game fails, that is the argument 
does not appear in the dictionary. 

If instead of searching an extension of the dictionary is to be per- 
formed by inserting the input as a key, then we continue in the last 
case as follows. We place a card with the same letter as that below 
piece a so that it adjoins lefthand to the sqare where piece b resides. 
We twist the card a small angle to the left, then we append a copy 
of the part of the argument line in front of piece & as a new branch. 
To finish inserting, a further argument, carrying the value to be as- 
sociated with the key is required. (The value is the encircled year in 
the example above) 



If we extend our example by inserting an entry for PETER (1965) we obtain: 
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Sorry, Figure 2 cannot be presented! 



This game should illustrate only how the mechanism of the data structure 
works in principle. We observe the following drawback in this model: From a 
certain size upwards considerable multiple branching will arise so that search 
becomes inefficient. A simple rule helps to overcome this difficulty: rely on a 
binary coding of the letters and play the game with the alphabet of and 1 ! Let 
us record that access by a matching chain of bits (input strip) is realized by a 
sequence of moves in the game model. Each move comprises testing agreement 
of the two current (the focused) bits of the input strip and the search tree 
respectively; if testing results to false, the focused bit migrates to the leftside 
branch (which exists for a matching input) and testing agreement is repeated; 
if finally testing results in true, both focuses move one node ahead (to the next 
bit). For a correct input the focus migrates through the branch in the search 
tree that carries the same sequence of bits as the input strip. It is significant 
to record that each entry except the first (the oldest one) is mapped to exactly 
one branching node in the search tree. Now let us design an adequate data 
structure. 



2 Data Structure of Dictionary 

An entry key + value is recorded as a data type Entry with field components. 
The dictionary as a whole is an organized collection of objects of type Entry 
together with a pool for keys, that is a compound of objects, which are in- 
terconnected by references, sometimes called pointers. One component to be 
considered is the bit-string of the key of an entry. With exception of the first 
entry the right part, that is the continuation after a branching off from the key 
of an elder entry, is sufficient for the searching algorithm. We shall record this 
bit string as a component of Entry referred to by selector name remkey. Now 
let us imagine further branching off nodes located in certain places of remkey. 
Remember the fact that each entry is mapped to exactly one branching node 
in the search tree, namely the one which leads into the entry. Hence it seems 
to be advisable to record only the first branching off by a component attached 
to the structure of Entry and to delegate further branching offs to the entry 
reachable from the first one. This is managed by introducing three further com- 



2 DATA STRUCTURE OF DICTIONARY 



4 



ponents for Entry : dist as the place of first branching in bit string remkey; 
branch as pointer to the entry (object of type Entry) that is reachable by 
the first branching off; and finally link as pointer to record further branching 
offs, which belong to the bit string remkey of the entry that was the focus (the 
current Entry) before the last assignement focus <— focus^ .branch. 

A pointer or reference to a certain object is the address or position of the 
object within the digital storage. A declaration f Entry lexpos stipulates lexpos 
as a pointer to an object of type Entry , the object is denoted by lexpos}. 

The meaning of the components branch and link is illustrated by the following 
sketch that enters into our game model. Assume the dictionary contains 4 
entries, the adresses of these are denoted by Li,L 2 ,L 3 , L4 and the corresponding 
route pattern is: 



Sorry Figure 3 cannot be presented! 



Then we have Li = L\ \ .branch, L3 = (L\ } .branch) } Jink, and 
L4 = {(L\ ] .branch) f Jink) j Jink. If for an object X of type Entry there are 
n+1 entries corresponding to the branching offs on bit string X. remkey according 
to the above model, that is each key associated with such an entry (component 
.remkey) exhibit the first difference to X. rem key in the branching locations, 
then the entries are X. branch }, X. branch j Jink f , . . . , X. branch {(.link {)". The 
branching positions can be read out of these objects: component Y.dist for each 
Y = X. branch { (.link f)- 7 ' (j < n) is the information that the leftmost difference 
of the key of X and that of Y is the bit at position y.dist in X. remkey (although 
remkey in general is only a right piece of the real key). 

2.1 Preliminary Data Types 

define type Bit ={0,1}; define type Bool = {false, true}; 
define type BitString = array [*] of Bit ; 

/* BitString s means s = (s[«]),<iength(s) */ 

define type NatN = Elements of N; 

define type ResultType = any type you may need below; 

define type Location = position of storage referring to some object; 

2.2 Data Types and References 

For each data type T we consider the type {T of reference to an object of 
type T (location of the object within storage). If a variable p is declared to 
be of type T (usually by writing "T p") then pi denotes the object referred to 
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by p. The relationship between jT and Location is that under declarations 
"T p, q; Location r; " the commands "r <— p; q <— r; " yield p = q. 

2.3 Procedures of Storage- Allocation 

| T valued /* T may be any type */ 

create(T, var Location freeloc) = 

/* freeloc is considered as the top of a stack */ 

begin 

j T result <— freeloc; 

Increase freeloc by the size of T; 

return (result); 
end. 



| T valued /* T may be any type */ 
create_copy(T v, var Location freeloc) = 
begin 

| T result <— freeloc; 
Increase freeloc by the size of T; 
result 1 <— u; 
return fresufe); 
end. 



First we study a simpler version of the data structure Entry , which does 
not yet realize the LiFo-property but differs from the final version essentially 
only with regard to the operation of adding a new entry by procedure insert. 
The logical connectives A and V are used with semantics that differ from 
the one used in logic, as they will be interpreted as a sequential and and or 
respectively. That is, if the first operand of V yields true within an evaluation, 
then evaluation of the expression built upon V ceases with value true without 
any computation of the second operand. In a similar way this applies to A 
with false in place of true. This can prevent e.g. for a conditional instruction 
as if (findpos ^ nil A d > findpos] .dist) . . . the computation of an undefined 
expression findpos] .dist in case of findpos = nil. 

2.4 Data Structure Dictionary /* simple (non-Mo) Version */ 

1 define type Entry = structure 

2 begin 

3 |BitString remkey; 

4 NatN dist; 

5 t Entry branch, link; 

6 ResultType val; 

7 end 

8 

10 define type Dictionary = 

11 structure begin | Entry gate; Location freeloc; end ; 
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12 

13 /* Procedure search computes the "f Dictionary position findpos for a 
bitstring x that corresponds to an entry in the dictionary L, the key of 
which is x, provided that this key occurs in L. The proper answer after 
a successful search then is findpos f.val. The answer whether the search 
was successful is the return value of procedure search. Procedure insert 
relies on search. If search(. . . ) returns value true, then findpos j.val 
needs only to be rewritten by the new associated value (a parameter of 
insert). In the other case L will be enlarged by one element (type Entry 
). Particularly for this purpose search is provided with output parameters 
insertpos and d. * / 

H 

is Bool valued 

id searc/i(DiCTiONARY L, BitString x, 



17 var ] Entry findpos, insertpos, var NatN d) = 

18 begin 

19 findpos <— L.gate; 

20 If (findpos = nil) then return(false) ; /* empty dictionary */ 

21 while (true) do /* terminated by return */ 

22 begin 

23 If (findpos] .rem key ] = x) then return ftrue); 

24 if ( one of x and findpos ] .remkey ] is a left substring of the other ) 

25 then d <— min(length(x), length (findpos ] .remkey t)) + 1; 

26 else let d be the smallest d, such that x[d] ^ (findpos] .remkey ])[d]; 

27 

28 insertpos <— findpos; /* ^ nil (on account of 20 and 55) */ 

29 findpos <— findpos ] .branch; 

30 while (findpos ^ nil A d > findpos] .dist) do 
3J begin 

32 insertpos <— findpos; 

33 findpos <— findpos J .link; 
3^ end ; 

35 if (findpos = nil V d ^ findpos] .dist) 

36 then return (false): /* x is not a valid key */ 

37 

33 x <- (a;[d + j])j<iength(x)-d; /* = shift_right(x, d) */ 

39 /* search continues at the entry the branch off leads to, i.e. after 
position d of remkey ]. */ 

40 end ; 

41 end ./* search */ 
42 

43 mser£(DiCTiONARY L, BitString x, ResultType v) = 

44 begin 

45 Entry findpos, insertpos; NatN d; 

46 if (search(L, x, findpos, insertpos, d)) 

47 then findpos ] .val <— v 
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48 else 

49 begin 

so newpos <— create(ENTRY, L.freeloc); 

51 newpos | . dist <— d; 

52 if (insertpos = nil) then L.gate <— newpos; 

53 /* note that insertpos = nil iff L.gate = nil */ 

54 if (insertpos] .branch = findpos) 

55 then insertpos] .branch <— newpos 

56 else insertpos] .link <— newpos; 

57 newpos] .branch <— nil; 
5S newpos ] Jink <— findpos; 

59 newpos] .remkey <- create_copy((x[d + j])j<i en gth(x)-d, i.freeloc); 

60 newpos t .val <— u; 

61 end ; 



62 end ./* insert */ 

2.5 Observation The worst case time complexity of procedure search of 2.4 

is 0(s) where s = length of x. 

Proof: The instructions within the while loop at lines 21-40 are to be consid- 
ered as follows: Assume the loop is passed r times, the associated values of d 
being (dj)j <r . Looking at lines 25,26 and 38 we have d +d\ + ■ ■ - + d r -\ < s = 
length(x). More than two subsequent rf,'s cannot be equal to (di = d i+ i = 
is only if keys u, uOv, ulw occur), therefore r < 2s. As to the inner loop of lines 
30-34, the construction of the link components according to insert ensures that 
anypos(1 Jink)™ f .dist is a strictly increasing sequence limited by d. In this way 
the inner loop is repeated pj times, if 

< findpos t .dist < findpos] .link | .dist < • • • < findpos{] .link) w_1 1 -dist < dj 

If we take dj k = findpos(] .\'\nk) k+1 | .dist — findpos(] .link) fe | .dist then each 
dj k > and d j = d j o + • • • + dj Pj -i- If we concatenate the tuples (dj k)k<pj 
and let (Si)^ be the result of the concatenation, we conclude, as above for 
the sequence of the dj's, that h < s, where h is the sum of the number of 
repetitions of the small loop for each traverse of the big loop. Hence the amount 
of computation time due to the small loop is O(s), the number of repetitions 
of the big loop is O(s), the instructions outside the small loop either require 
constant time or are limited by dj and, as the sum of the dj's is not greater 
then s, the whole algorithm of search has worst case time complexity O(s). 
■ 

2.6 Data Structure LiFo-Dictionary 

Each component of the preceding data-structure is included, partly 
modified. Two additional procedures open_environment and 
close_environment arc introduced. Data-type Dictionary 
gets the new component pop of type f Dictionary , so that a 
stack of DiCTiONARY-frames is maintained. 
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i define type Entry = as in 2.4 Dictionary ; 

2 

3 define type Dictionary = 

A structure begin 

5 t Entry gate; 

6 Location freeloc; 

7 t Dictionary pop 

8 end ; 

9 

10 open_environment( var ] Dictionary DictPtr) = 

11 begin 

12 ] Dictionary OldDictPtr <— DictPtr, 

13 Location NewFreeLoc <— DictPtr] .freeloc; 

14 DictPtr <— create(DiCTiONARY , NewFreeLoc); 

15 DictPtr] .gate <— create_copy( OZdDic£P£r| -gatef, NewFreeLoc); 

16 DictPtr] .pop <- OldDictPtr, 

17 DictPtr] .freeloc <— NewFreeLoc; 
is end . 

19 

20 close_environment( var f Dictionary DictPtr) = 

21 DictPtr <— DictPtr] .pop 



22 

23 /* Note, that we associate our dictionary with a variable 
| Dictionary DictPtr. The LiFo-storage for the whole compound 
is characterized by its top-position DictPtr]. freeloc. Thus the single 
assignment of close_environment suffices to restore the state before 
the last call of open_environment. */ 

24 

25 Bool valued 

26 search(] Dictionary DictPtr, BitString x) = 

27 begin 



28 ] Entry findpos, insertpos; NatN d; 

29 /* these are no longer required for insert, */ 

30 /* we use search of 2.4 */ 

31 return(Dictionary. search(DictPtr] , x, findpos, insertpos)); 



32 end . 

33 

34 insert(] Dictionary DictPtr, BitString x, ResultType v) = 

35 begin 



36 ] Entry findpos, insertpos, newpos; 

37 NatN d; Bool key-exists; 

38 

39 findpos <— DictPtr] .gate; 

40 while (findpos ^ nil) do 

41 begin 

V2 if (findpos < DictPtr) 
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43 then findpos <— create_copy (findpos], DictPtr ] .freeloc); 

44 key-exists <— (findpos ] .rem key j = x); 

45 if (key-exists) then breakloop; 

46 if ( one of x and findpos ] .remkey f is a left substring of the other ) 

47 then d <— min(length(x), length (findpos ] .remkey t)) + 1; 

48 else let d be the smallest d, such that x\d] ^ (findpos] .remkey ])[d\; 



I* 

Note that findpos < DictPtr iff findpos has been created before last 
call of open-environment, which caused the creation of DictPtr. If 
this is true, then findpos shall be replaced by a copy of it which is 
located atop of DictPtr. This copy will be removed at next call of 
close-environment. 

7 



50 insertpos <— findpos; findpos <— findpos ] .branch; 

5J if (findpos ^ nil A findpos < DictPtr) then 

52 begin 

53 newpos <— create_copy(/mdpost, DictPtr ] .freeloc); 

54 insertpos ] .branch <— newpos; 

55 findpos <— newpos; 

56 end ; 

57 while (findpos ^ nil A d > findpos] .dist) do 
5S begin 

59 insertpos <— findpos; 

60 findpos <— findpos] .link; 

6J if (findpos ^ nil A findpos < DictPtr) then 

62 begin 

6,? newpos <— create_copy(/mdposT) DictPtr] .freeloc); 

6^ insertpos] .link <— newpos; 

65 findpos <— newpos; 

66 end ; 

67 end ; 

6S If (findpos = nil V d ^ findpos] .dist) then breakloop; 

69 /* with fce2/_ea;iste=false, i.e. ir is not a valid key */ 

70 

7J x <- (a;[d + j])j<iength(x)-d; /* = shift.right(x, d) */ 

72 /* searching continues in branching off (after position d of remkey |) */ 

73 end ; 

74 

75 if (key-exists) 

76 then findpos ] .val <— u 

77 else 
7S begin 

79 newpos <— create(ENTRY, DictPtr] .freeloc); 

so newpos ] .dist <— d; 

si if (insertpos — nil) then DictPtr] .gate <— newpos; 

82 /* note that insertpos = nil iff DictPtr] .gate = nil */ 
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83 if (insertpos] .branch = findpos) 

84 then insertpos] .branch <— newpos 

85 else insertpos] Jink <— newpos; 

86 newpos] .branch <— nil; 
§7 ne wpos t -'ink ^~ findpos; 

88 newpos ]. rem key <- create_copy((a;[<i + j])j<i e ngth(x)-dj 

DictPtr ] .freeloc); 

90 newpos] .val <— u; 

9J end ; 



92 end ./* insert */ 

There is a stack of items of type Dictionary with top DictPtr. These 
items are connected by the downward link pop. Next segment below DictPtr] 
is DictPtr] .pop] . Each element (of type ENTRYor BitString ) we can access 
from DictPtr] .pop] was created before DictPtr]. By Definition of create and 
create_copy an element p] is older (created later) than another element q ] if 
and only if p < q. Hence the address value of each element accessible from 
DictPtr] .pop] (by iterated application of .branch ], Jink f ) is less than DictPtr. 

Part 39-73 of procedure insert is a modification of search from 2.4, which 
additionally copies the traversed entries findpos ] and chain the copies parallel 
to the originals by branch and link. The same considerations as for search yield 
that the worst case time complexity of procedure insert also is 0(s). 



